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Abstract: Statistical models incorporating change points are common in prac- 
tice, especially in the area of biomcdicine. This approach is appealing in that 
a specific parameter is introduced to account for the abrupt change in the 
response variable relating to a particular independent variable of interest. The 
statistical challenge one encounters is that the likelihood function is not dif- 
ferentiable with respect to this change point parameter. Consequently, the 
conventional asymptotic properties for the maximum likelihood estimators fail 
to hold in this situation. In this paper, we propose an estimating procedure for 
estimating the change point along with other regression coefficients under the 
generalized linear model framework. We show that the proposed estimators 
enjoy the conventional asymptotic properties including consistency and nor- 
mality. Simulation work we conducted suggests that it performs well for the 
situations considered. We applied the proposed method to a case-control study 
aimed to examine the relationship between the risk of myocardial infarction 
and alcohol intake. 



1. Introduction 

The problems of detecting abrupt changes at unknown points and estimating the 
locations of changes are known as the change point problem. The change point 
problem occurs frequently in medical research. For example, cancer incidence rates 
remain relatively stable for people at a younger age, but change drastically after a 
certain age threshold (MacNeill and Mao [13]). The data obtained from a group of 
preschool boys indicates that their weight/height ratio relates to their age in one 
way before a certain age but that the functional relation of the two changes after- 
wards (Gallant [8]). Another example arises from a study of the risk of myocardial 
infarction (MI), which showed a sharp decrease in risk at low alcohol intakes and 
a dramatic increase after reaching a certain amount of daily alcohol consumption 
(Pastor and Guallar [16]). Although these examples each have distinctive features of 
their own, the common theme here is that the relationship of the response variables 
and a covariate of interest is subject to an abrupt change at a certain threshold. 
Very often, scientists are interested in the threshold for clinical or preventive pur- 
poses. The change point model is useful in that it purposely includes a parameter 
to capture the notion of threshold. 
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In this paper, we focus on the estimation of the change point in the so-called 
broken-line regression models, where the regression function is assumed to be con- 
tinuous at the point of change. Estimation for these type of models with normally 
distributed responses has been developed by various authors. Sprent [18] was among 
the first to discuss the estimation of the piecewise linear models. His interest in 
this type of model is based on the observation that a biologist would often postu- 
late a two-phase linear model over some alternatives such as the quadratic model 
largely on intuitive grounds. Hinklcy [10] considered the same two-phase straight 
line model and derived the maximum likelihood estimator (MLE) of the change 
point by its marginal likelihood function and presented the asymptotic distribution 
of the estimator. Feder [7] studied the model in a more general framework and 
proved the consistency of the least-squares estimators of the regression coefficients 
and the change point. The estimators are asymptotically normal for some special 
cases including models with all linear segments. Bhattacharya [2, 3] presented the 
asymptotic properties of the change point and regression coefficient estimators using 
a local log likelihood process approach. Through this approach, he showed the dis- 
tinctive features of the asymptotic properties of the change point with and without 
the continuity constraint at the point of change. 

A major difficulty in estimating the change point as a parameter for regression 
models is the nonsmoothncss of the likelihood function with respect to the change 
point considered as a parameter. Many authors tried to circumvent this problem 
by using various smooth transitions between the two linear regimes, or using other 
types of the function such as the quadratic function in one of the segments separated 
by the change point. This technique was mostly used for the model with normally 
distributed response variables. Gallant and Fuller [!)] discussed such a model and 
used a modified Gauss-Newton method to obtain the least squared estimates. The 
simulation studies showed that the estimates obtained by this method work well, 
but no asymptotic properties were presented in their work. Bacon and Watts [1] 
proposed a model which can accommodate a smooth transition as well as an abrupt 
change with a Bayesian estimation procedure to determine the parameter values. To 
the best of our knowledge, with the exception of normal distributions, there is very 
little research being done where the response variables are categorical. Furthermore, 
the asymptotic properties of the estimators are virtually unavailable. 

In Section 2, we propose an estimation procedure for the change point and the 
corresponding regression coefficients in the framework of generalized linear mod- 
els, and present the asymptotic properties of the proposed estimators. We discuss 
in Section 3 some extensions of the estimation procedure to more general models 
involving a change point. Section 4 discusses the choice of the bandwidth param- 
eter and the smoothing function for the proposed estimation procedure. Section 5 
presents the simulation results to assess the finite sample performance of the pro- 
posed method. In Section 6, we apply the proposed method to a data example from 
the EURopean study on Antioxidant, Myocardial Infarction, and breast Cancer 
(EURAMIC, Kardinaal et al. [12]) followed by discussion in Section 7. The proofs 
of the propositions are given in the Appendices. 

2. Proposed estimation procedure for the change point 

Our approach is closely related to the density estimation, whereas the cumulative 
distribution function (CDF) is estimated by the empirical distribution function 
(EDF), but the density function cannot be estimated by the derivative of CDF, 



On estimating the change point 



307 



which in essence is a step function (Rao [1 7]). In density estimation, a kernel func- 
tion (usually a known density function itself) is introduced to solve this problem by 
making the EDF smooth and differentiable everywhere. The proposed estimation 
procedure is directly motivated by the approach adopted by Horowitz [11] and in 
the Econometrics literature. In particular, Manski [14] showed that the maximum 
score estimator of the coefficient vector of a binary response model is consistent un- 
der a weak distributional assumption. However, the asymptotic distribution of the 
maximum score estimator was not presented since the objective function it maxi- 
mizes is a step function. Horowitz [11] used a smooth version of the same objective 
function to make it continuous and differentiable. The procedure produced an es- 
timator that not only converges more rapidly but also has a tractable asymptotic 
distribution. 

For a sequence of random variables Yi,i = 1, . . . , n, having probability (density) 
function of the form 

(2.1) f(y i ;6)=c(y i )exp(ey i -b(e)), c>0 

with the natural link function, we consider the following model incorporating the 
unknown change point r as 



(2.2) 9 = g(p) =/3 +f3 lX + f3 2 {x-T)+, 

where a+ = al(a > 0) and I(a > 0) is the indicator function, and 02 ^ for 
identifiability of r (Davies [5]). Here /?o and Pi are the intercept and the slope 
relating the response variable Y, through the link function <?, to the covariate x 
for x < r, and $2 is the difference in slopes for the segments before and after the 
change point r. With the traditional likelihood approach, the likelihood function 
is not differentiable with respect to the change point r. Specifically, the indicator 
function I(x > r) is not differentiable with respect to r. Consequently, one of the 
regularity conditions for the usual asymptotic theory, namely, a certain degree of 
smoothness of the objective function with respect to the parameters, is violated. To 
circumvent this critical problem, define a continuous function K(-) which satisfies 

1. |if(w)| < M for some M, < M < oo and u G (—00,00). 

2. lim K{u) = and lim K{u) = 1. 

u— >— 00 u— *oo 

We propose to estimate the change point r as well as the regression coefficients 
/3q, /3i and 02 by maximizing the following objective function: 

n 

Q„(/3,r) = Y}yi{^+^i+^{.Xi-T)K{^^)} 
- b((3 + P1X1 + fo(xi ~ t)K(^—^))] 

n 

= E*(At), 

i=l 

where {h n : n = 1, 2, . . .} is a sequence of positive numbers satisfying lim h n = 0. 

n — >oo 

Here K (•) is analogous to a cumulative distribution function rather than a den- 
sity function, the latter being more commonly used in problems such as density 
estimation. 
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Note that the difference between this objective function Q n and the otherwise 
usual log-likelihood function is that the indicator function I(x > r) in the likelihood 
function is replaced by the smoothing function K((x — r)/h n ). It is clear that the 
objective function Q n is twice diffcrentiable with respect to all parameters, and 
with some suitable conditions on the distribution for x, K((x — T)/h n ) converges 
to I{x > t) uniformly as n — + oo. As shown below, these two factors play key roles 
in the asymptotic behavior of the estimates which maximize Q n . 

Define 



S = (/3,t) = (/3 ,/?i,&,t) 

5) = S7Qn(S) = (- 
£ n (/3,r) = cov /3 , T S'„(/3,T 



c „ n f dQJAr) dQJAr) dQ n (J3,r) dQ n (f3,r) u 

bn{o) = \7Q„(d) = ( — , — , — , ), 

of3 dpi d(3 2 or 



and 

J„(/3,t) = -yS n {8). 

Note that we use the conditions imposed on the limiting configuration of the 
covariate X, similar to those of Bhattacharya [3] . Suppose that there is a function 
F with a = F(t ), for x\ < x<x ■ ■ ■ , < x na < t < x na +i < • • • < x n , and as n — > oo, 



(2.3) H^E 



1 /Lti 

Mi Mi + 



(2-4) (n-na)" 1 V f 1 ^ - f 1 2 ^ 2 2 V 

Following Fahrmeir and Kaufmann [6] , to ensure the asymptotic normality of 8, 
some additional assumptions are needed. Specifically, define a neighborhood of true 
parameter values 8° = , Pi , P% 7 T ° ) as 

iV„M = {J : ml/ 2 Y(8 - 8°)\\ < „}, n = 1,2, ... , 

where u) > 0, £ n is simplified for £ n (<5°), and £«^ 2 is defined as the square root of 
positive definite matrix £ n such that £ n = eV 2 (EJ/ 2 )'. The conditions are 

(I) lim n _ too ^'E n = £, where £ is finite and positive definite. 
(II) For all u) > 0, max ||V^(<$) — III — ► in probability under both measures 

SeN„ (ui) 

P S o and P 5n with 8 n = <5° + w(£~ 1/2 )'A, where V„(<5) = £ r T 1/2 J„(<5)(£^ 1/2 ) t 
and A' A = 1. 

(Ill) g{-) is twice continuously differentiable g(-),g'(-) and <?"(•) are bounded. 

Proposition 1 (Asymptotic normality). Let 8 = (/3, f) 6e £/ie estimators which 
maximize the objective function Q n (f3,r) and 8° be the true value of 8, then under 
some regularity conditions and given that 

(i) Xi is assumed to be bounded, where i = 1, . . . , n; 

(ii) lim P(\X n — t\ < e) = for some e > 0; and 

n — >oc 

1 - 

(iii) lim - V EilogfiY,; [3, r, ar 4 ); [3°, t } < 00 

n— »oc 71 z — ^ 
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the normed estimator for S is asymptotically normal, i.e., 

^ n - 1/2 Jn0o - P° , Pi ~ Pi, r Y £ N(0, U). 

Or equivalently, by the results of Lemma B. 1 in Appendix B. 

v^($ - Pi $i-ftj 2 -ft,T- T°y 5 jv(o, s- 1 ). 

Proof of Proposition 1 is given in the Appendices. 

3. Some extensions 

The model (2.2) discussed so far involves two straight lines, which is characterized in 
the literature as broken line regression or joint point models. Straight line describes 
the abrupt change mechanism more distinctively than other types of models. As 
discussed in the introductory section, some authors (e.g., Pastor and Guallar [16]; 
Gallant [8]) used a quadratic-linear or linear-quadratic model to characterize the 
change point. For example, a linear-quadratic model is expressed as 

(3.1) g(jn) = Po + Pim + fa(xi - T) 2 I(xi > t). 

A quadratic-linear model can be expressed similarly by changing the indicator func- 
tion to I(xi < t). The advantage of this type of model, specifically, is that for 
estimation purposes the likelihood function has the first derivative with respect 
to the change point. Thus the Fisher information can be derived as the covari- 
ance of the score function. However, the usual asymptotic properties still cannot 
go through. 

Since both linear-linear and linear-quadratic (or quadratic-linear) models face 
the same non-differentiability problem in r, the approach proposed in section 2 can 
be easily adopted and extended to the latter situation. 

Corollary 1. Replacing the term Xi — r by (xi — r) 2 in Q n , and with the regularity 
conditions tailored to such a replacement, the resulting estimators of (J3q, 0i, Pi, t) 
for model (3.1) are consistent and asymptotically normal. 

The proof is similar to that of the linear-linear model and is omitted. In model 

(2.2) , we have assumed that there is only one independent variable, which involves 
the change point. This method can be easily extended to situations where adjust- 
ment for additional independent variables is needed. Specifically, suppose there are 
k additional variables Z\ , . . . , Zk, the systematic component part of a generalized 
linear model can be modified as 

g{fH) = Po + PxXi + P%{xi — r)+ + 7i zi H h 7fcz fc . 

The asymptotic properties of the estimators for 7 and 5 are similar to those in 
model (2.2) without additional independent variables and are thus omitted. 

4. Computational issues 

The estimation of the parameters (Pq, P\,Pi, t) involves the value of the bandwidth 
parameter h n . For a smoothed maximum estimator of a binary response model, 
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Horowitz [11] suggested choosing bandwidth h n oc n 1 /( 2m + 1 ) ; where m is the 
order of the kernel K' (•) defined as the integer satisfying the following 



(4.1) / v l K'(v)dv 



0, if i < 771, 

d(nonzero), if i = m. 

In our asymptotic analysis, with K{-) appearing as a single term in some of 
the elements of the hessian matrix, the form of h n does not appear to affect the 
asymptotic properties of the estimators as long as h„ — * when n — * oo and meets 
the restriction as follows. 

A sufficient condition for the uniform convergence of K{{xi—T)/h n ) — > I(xi > r), 
is (xi — r)/h n — » ±oo for all z = 1, . . . , n. Generally, in order to satisfy the above 
conditions, we need to have h n << min(|xi — Xj\, 1 < i,j < n). Specifically, in the 
iteration procedure to estimate the parameters, h„ needs to be small enough so 
that the negative hessian matrices arc positive definite. 

It is recognized in the density estimation literature that the choice of the smooth- 
ing function does not affect the asymptotic properties of the density estimator. In 
the simulation that follows, we are listing the results where K(-) is chosen as the 
cumulative distribution function of the standard normal distribution. We have also 
conducted the simulation with K(-) as the cumulative distribution function of the 
exponential distribution. Results concerning the distribution of the estimators and 
the actual estimated values are very similar for the two different K(-). However, for 
the same n, h n can be chosen as a function of n that approaches in a slower rate 
with K(-) being the exponential cumulative distribution function for the algorithm 
to converge. This is consistent with the conditions (a) and (b) in Lemma B.l, i.e., 
{(x-T)/h n }K'((x-T)/h n ) -> and {{x - t) / h n } 2 K" {(x - t) / h n ) -► as n -> oo. 
Note that x — t is bounded. This is equivalent to requiring h n to converge to in 
a relatively slower rate than K'{po) and K"(oo). Since the exponential density at 
the tail converges to at a slower rate than the normal density, the corresponding 
h n can be chosen this way as well. 

We have found in our simulations that the objective function Q n when evaluated 
at the true values /?§, 0±, 0$, a function of r, is rather flat around the true change 
point value r°. In dealing with this issue, aside from having a small tolerance level 
(10 -5 at most) for convergence, another crucial issue in actually carrying out the 
estimation algorithm is to choose the initial values of the parameters PotPi,P2 
and r. In our simulations in section 5 and the example in section 6, for the data 
generated, we first made some graphs of nonparametric models such as the LOESS 
model, then we chose a number of tj, i = 1, . . . , J, which on the graph are near the 
potential change point. Next we consider the GLM models. With the fixed change 
point valued at Tj, the corresponding /3(rj) values were then obtained using ordinary 
GLM fitting software. t p = argmax Ti Q„(/3(ri), t{) is chosen as the initial estimate 
for r, and (3(t p ) as the initial values for (3 = (0o, /3±, fa). 

After replacing the indicator function in the likelihood function with the smooth- 
ing function K(-), the model can be categorized as having nonlinear parameters. 
For easy calculation, according to section 11.4 in McCullagh and Ncldcr [15], model 
(2.2) can be approximated by 

g(fXi) = Po + PiXi + fi 2 Ui + cvi, 

where u = (x - t )K(^), v = ~{K{^-) + ^K'(^)} and r is an initial 
value of r, which supposedly is close to true r. 
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After this approximation, the coefficient estimates as well as their covariance 
matrix can be easily obtained by regular software. The estimated r is then f = 
To — c/02, and the approximate standard error by the delta method is 
{(l//3 2 , — c//3 2 )cov(c, $%) (l/fo, — c//3|) f )} 1 ^ 2 . As illustrated in our simulations, the 
value of the standard error obtained by this approach is similar to the one obtained 
by the formula supplied in Proposition 1. Hence, from a practical viewpoint, the 
proposed method is readily implemented in software packages such as SAS and R. 

5. Simulation studies 

Simulation studies were conducted to examine the finite sample performance of 
the proposed estimating procedure. Specifically, normal data with identity link and 
binary data with logit link incorporating a change point were considered in the 
simulations. 

For the general linear model, we assume that the response variable is normally 
distributed with constant variance a 2 = 1, and that the independent variable X fol- 
lows a uniform distribution with a range of [—2, 2]. We generated the data according 
to 

fj,(x) = E(Y\x) = 2 + 3x-5{x- 0.5)+. 

In this model, the change point in which the relationship between Y and x changes 
is set at r = 0.5. For each of the 1000 replications, we generated a sample of n=500 
independent observations from the distribution N(fi(x), 1). The mean, median and 
standard deviation of 6 proposed in section 2 as well as their average standard 
errors calculated based on Proposition 1 are presented in Table 1. As discussed in 
Section 4, results of the estimated values as well as the precision of the estimates are 
not sensitive to the choice of the smoothing function. The smoothing function used 
in this simulation is K(u) = l/(27r) 1 / 2 J" e~* t 2 dt, and the smoothing parameter 
h n is set to be n~ 2 . The average s.e. by Delta method in the table is referring to the 
standard error obtained via the formula discussed in section 4. Note that the mean 
and median of the estimates are very close to the true values. This, along with the 
fact that the average standard errors are very similar to the corresponding sample 
standard deviation of the estimates suggests that our estimators converge to the 
true values and the normal approximation for the distribution of the estimators is 
valid. 

The empirical coverage probabilities for 8, which in this simulation are the pro- 
portion of the 95% confidence intervals (calculated by the usual estimate ±1.96sc) 



Table 1 

Estimates of the change point and the (5 1 s from 1000 simulated samples of 500 observations for 
normal response variable with identity link 



Parameters 


/3o 


01 


/3 2 


T 


True Value 


2 


3 


-5 


0.5 


Mean 


1.980 


2.983 


-4.977 


0.503 


Median 


1.978 


2.983 


-4.972 


0.505 


S.D. 


0.074 


0.076 


0.208 


0.039 


Average s.e. by Proposition 1 


0.082 


0.078 


0.188 


0.037 


Average s.e. by Delta method 


0.076 


0.074 


0.202 


0.040 


Empirical coverage probability(%) 










of normal CI 


95.9 


95.4 


92.8 


94.2 


Empirical coverage probability 










of bootstrapped CI (%) 


92.6 


93.3 


93.0 


94.4 
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Fig 1. Q-Q plots of the sampling distribution of the estimates from Table 1. 



for the 100 random samples containing S, by and large, are close to the nominal 
95% level. In addition, the bootstrapped confidence intervals are also obtained fol- 
lowing the bootstrap resampling scheme suggested by Boukai [4]. For each of the 
random samples (Xi,Yi),i = 1, ...,n, first the change point r as well as other 
parameters were estimated according to the proposed estimation procedure. The 
bootstrapping resample (X* ,Y*),i = l,...,n was then generated in such a way 
that {X*,Y*),i < [f] are from (Xi,Yi),i < [f], and the rest {X*,Y*), [f] < i < n 
are from (Xi,Yi), [f] < i < n, where [a] is defined as the nearest integer to a. We 
obtained 1000 such resamples for each of the 1000 original random samples, and 
then the empirical 95% bootstrap confidence interval was obtained following the 
application of the proposed estimation procedure to these resamples. The propor- 
tions of these 1000 bootstrap confidence intervals containing the true parameter 
values of i5 are listed in Table 1 as the coverage probabilities of the bootstrapped 
CPs, which are reasonably close to the coverage probabilities of the normal CI and 
hence the nominal 95% level. 

The Q-Q plots for the estimated f3 and the change point r in Figure 1 also 
show that the sampling distributions of these estimators are close to the normal 
distributions. Results in Table 1 and Figure 1 suggest that the asymptotic normality 
of the estimators is well supported by the findings via this simulation. 

For the logistic regression, we generated the regressor X from a uniform distri- 
bution U{— 2, 2), and the binary response variable Y according to 

pr(Y = l\x) = 1/[1 + exp{-(2 + 3x - 5(x - 0.5)+)}]. 

As in the normal response case, we also generated a random sample of 500 from 
this model with 1000 replications. The smoothing parameter used here is h n = n~ 3 . 
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Table 2 

Estimates of the change point and the (3's from 1000 simulated samples of 500 observations for 
the binary response variable with logit link 



Parameters 


Po 


Pi 


Pi 


T 


True Value 


2 


3 


-5 


0.5 


Mean 


2.029 


3.036 


-5.058 


0.490 


Median 


2.014 


3.023 


-5.037 


0.487 


S.D. 


0.282 


0.333 


0.647 


0.146 


Average s.e. by Proposition 1 


0.273 


0.322 


0.668 


0.145 


Average s.e. by Delta method 


0.280 


0.329 


0.642 


0.154 


Empirical coverage probability(%) 










of normal CI 


94.9 


95.5 


95.3 


96.0 


Empirical coverage probability 










of bootstrapped CI (%) 


93.0 


94.5 


93.8 


94.9 



histogram of estimated p(0) histogram of estimated 3(1 ) 




-3-2-10123 -3-2-1012 
Theoretical Quantiles Theoretical Quantiles 



histogram of estimated [3(2) histogram of estimated t 




-3-2-10123 -3-2-10123 
Theoretical Quantiles Theoretical Quantiles 



Fig 2. Q-Q plots of the sampling distribution of the estimates from Table 2. 



Results as shown in Table 2 and Figure 2 suggest that similar conclusions can be 
obtained for binary data as well. 



6. An example: The EURAMIC study 



In this section, we applied, as an illustration, the proposed estimating method in 
Section 2 to the data from the EURAMIC study (EURopean study on Antioxidants, 
Myocardial Infarction, and Breast Cancer) . The EURAMIC study (Kardinaal et al. 
[12]) is an international case-control study conducted in eight European countries 
and Israel, which was designed primarily to evaluate the association of antioxidants 
with the risk of developing a first myocardial infarction (MI) in men aged older 
than 70. Our example focuses on the portion of the data involving the dose-response 
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Table 3 

Parameter estimates and their standard error estimates (when available) and 95% confidence 
intervals relating alcohol intake to the risk of MI in the EURAMIC study 



Parameter 
(s.e.) 

(95% C.I.) 




01 


02 


T 


Pastor and Guallar [16] 




0.008 


0.009 


13.118 






(0.000,0.021) 


(0.001,0.084) 


(4.679, 50.062) 




-11.64 


0.008 


0.009 


13.10 


Proposed method 


(1.49) 


(0.004) 


(0.007) 


(4.68) 




(-14.57, -8.71) 


(0.0002,0.016) 


(-0.005,0.022) 


(3.94, 22.27) 



of alcohol intake and risk of myocardial infarction. For comparison purposes, we 
used the same sample as in Pastor and Guallar [16], who entertained the change 
point model. In this sample, there are 330 cases who had a confirmed diagnosis of 
first acute MI and 441 controls who were obtained through several random sample 
schemes. The primary risk factor is alcohol intake during the year before the study 
took place. It is well-recognized in the literature that the risk of MI as a function 
of alcohol intake is J-shaped. To capture this phenomenon, Pastor and Guallar [16] 
have fitted a quadratic-linear model to relate the risk of MI, in logit scale, to alcohol 
intake, adjusting for other covariatcs including age, smoking status, waist-hip ratio, 
history of diabetes, history of hypertension, family history of coronary disease and 
the dummy variables identifying the medical centers where the data were obtained. 

Tabic 3 shows the estimates and their confidence intervals reported by Pastor 
and Guallar [16]. The approach they took is likelihood ratio-based using the original 
likelihood which has only the continuous first derivative with regard to r. In their 
approach, the authors estimated the regression coefficients and r by maximizing 
the likelihood function without smoothing. Recognizing the breakdown of the con- 
ventional asymptotic results, they computed the likelihood ratio-based confidence 
intervals for r and other parameters. 

Also presented in Table 3 are results using the proposed estimating method. Here 
K (•) is chosen to be the cumulative distribution function of the standard normal 
random variable, and h n = n~ 3 , where n = 771, the total sample size for this 
data set. Both approaches give rise to very similar point estimates for the change 
point t as well as the (3 coefficients. Specifically, the threshold for which the risk of 
MI changes its direction is estimated at 13.1(±4.68) grams per day. However, the 
confidence intervals of r are rather different from each other. While the approach 
by Pastor and Guallar [16] did not completely address the problem resulting from 
the nonsmoothness of the likelihood function with respect to the change point 
parameter, our estimated confidence interval for r (f± 1.96s.e.) is based on a valid 
asymptotic theory and, at least for this example, provides a much tighter interval for 
investigators to pinpoint the point where the well-known protective alcohol intake 
effect may be reversed. 

7. Discussion 

Many approaches have been suggested to capture the phenomenon of abrupt change 
in relating the response variable to a particular independent variable. These include 
non-parametric smoothing and transforming or categorizing the continuous inde- 
pendent variables, etc. Like other approaches, the change point model is unlikely 
to fully capture the underlying mechanism. This approach, however, is appealing 
in that a specific parameter, t in this case, is introduced to quantify the scientific 
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objective of interest. In the alcohol intake and risk of MI example discussed above, 
by estimating the threshold for alcohol intake where the risk of MI is apparently 
heightened and the direction altered, the clinician is in a position to inform pa- 
tients at risk the maximum allowable level of daily alcohol consumption. In fact, 
clinical advice as such is given daily to patients at risk for various diseases. There- 
fore, obtaining more precise estimates for these threshold values via the change 
point models rather than giving a simple cutoff point via conventional wisdom or 
experience would have significant clinical implications for health care practice. 

On the other hand, the introduction of the change point parameter into the 
statistical model brings intriguing theoretical difficulties in both detection and es- 
timation of such phenomenon. Our primary objective in this paper is to provide 
an estimate for the change point with desirable asymptotic properties under the 
GLM framework. The consistency and the asymptotic normality of the proposed 
estimators, whose variance can be easily estimated, enhances the opportunity for 
researchers to make statistical inference on the change point parameter. 

Our approach of using the modified objective function eliminates the nonsmooth- 
ness problem with the change point parameter in the likelihood function. Similar 
modification can be applied to the situation where only the first two moments of the 
response variable is available, and the full knowledge of the probabilistic mechanism 
is absent. This work will be reported elsewhere. 

Appendix A: Proof of consistency 

First, we state the consistency of the estimators, which is needed in the proof 
asymptotic normality, as 

Proposition A.l (Consistency). Let 8 = (/3, r) be the estimators which maximize 
the objective function Q n (f3,r) and 8° be the true value of 8, then under some 
regularity conditions and given (i),(ii) and (Hi), 8 converges in probability to 8°, 

i.e., 8^8° = (J3°,t°). 

To show the consistency of the estimators maximizing the objective function 
Qn(0-> T )i we need the following lemma. 

Lemma A.l. For a sequence of random variables satisfying (2.2), with the same 
conditions as in Proposition A.l, 

(A) sup -\Q n (/3,T) - l n (/3,T)\ 4o. 

Proof. First, under the condition lim P(\X n — r| < e) = 0, e > 0, there exists a 

71 — VOO 

N > 0, s.t. for n > N, P(\X n - r| < e) < T), for any n > 0. K{^f^) -> I{x t > r) 
uniformly for z > TV. In addition, if |JQ — r| > e and K(-) satisfies equation (4.1), 
it can be easily shown 

(A.l) K(^f^)=I(xi>T)+o(h^). 

fin 

Note that b'(-) = cT^-), 

D = -(Q n (f3,r)-l n (f3,r)) 
n 

= - E( r * - 9-\P))W*i - r)(K{-L-^) - I{ Xi > r)), 
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where \6* - (#, + f3 1 + f3 2 ( Xl - r)+)| < \fo(K{^) - I{ Xi > r))|. 

1 N - 
D = -Y / (Y*-9~ 1 (0*))M^~T)(K(?^)-I(x l >T))I(\x. l ~T\<e) 

i—l 

+ 7, J2 ( Kl ~ d^iODfci^ - t){K{^——) - I( Xi > t))I(\ Xi -r| < e) 
+ - E( y * - - r)(iT(^— -) - I( Xi > r))I{\ Xi - r| > e). 

4 — 1 

Since var{Yi) < oo, by the weak law of large numbers, 

^ n 1 n 

- V[k, - B(y t )] 4 iim - V Bp* - £(!■)] < oo. 

n z — ' n — >oc 77, *■ — ' 



n * — ' n— »oo rt 

i=l i=l 



With some constant C > 0, combining this with the condition P(\X n — r| < e) — > 
as n — + oo, 



JV 



sup-|Q n (Ar)-Z n (Ar)| < sup -| V[y, - s" 1 ^*)]] • 2C ■ 1 

n 

+ sup -| J2 [Yi-g-\e*)]\-2C- v 

((3,r) n i=N+1 
n 

+ sup-|^[K i - 5 - 1 r)]|.C-o(C)-l- 

Since N is finite, the first term converges to as n — > oo, and ry can be made 
arbitrarily small, hence we have 

sup -\Q n (/3,T)-l n (/3,T)\ 4o. □ 
09,r) n 

Proof of Proposition A.l. The log likelihood function for the observed Y's is 

Z„(/3,t) = ^logf{Y i ;f3,T,x l ) 
i=i 

= ^K{/3 +/3iX l +/3 2 (x l -r) + } 

i=l 

- 6(A) + fta* + (3 2 {x l - r)+) + 2ogc(K t )]. 

By the weak law of large numbers, i X)"=i^°5/(^i T > x i)~E{logf(Yi; (3, r, Xi);p°, 

r°}} ^ 0. 
Let 

1 " 

Z (/3,r) = lim - Y^E^ogfiYcP^Xi);^,^}. 
i=l 



On estimating the change point 317 

By Jensen's inequality, and since 

exp(Y(P + fax + fa(x - r)+) - b(fa + fax + fa{x - r)+) + c{Y)) 



E{ 



exp(Y(P° + [3°x + $(x - r°)+) - &(#} + /3°z + p°(x - r°)+) + c(Y)) ' 

/?V°} = 1, 



then 

exp(r(/3 + fax + ft (a - r)+) - 6(/3 + ftx + fa(x - r)+) + c(F)) 



E{-log{ 



exp{Y(p° + fix + fa 2 {x - r°)+) - + + /3 2 °(a; - r°) + ) + c(F)) /! 

/?V°} 

> 0. 
Hence we have: 

(B) Iq((3,t) maximizes at the true parameter values (/3°,t°). 

(C) If X's are bounded, and (/3, r) G is a compact set, it follows that since Y((3q + 
faX + fa(x - r)+) - b(fa + fax + fa{x - t)+) < M\Y\ + N and E[\Y\] < oo, 
similarly, as in the proof for consistency of maximum likelihood estimator, we 
have 

(D) sup\-l n (J3,T)-l (J3,T) \ 4o. 

(At) U 

(E) lo((3, t) is continuous in (/3, r). 

By (A), (D) and the triangle inequality, 

sup|-Q n 09,r)-Zo03,r)| < sup |-(Q„(/3,r) - l n (P,r))\ 
09,r) n 03,r) n 

+ sup|-;„(/3,r)-/ (/3,r)| 4o 

(At) " 

This implies that Q n ((3, r) satisfies the hypothesis of Theorem 4.1.1 of Amcmiya 
(1985). Hence we have 5^5°. □ 

Appendix B: Proof of asymptotic normality 

In this appendix, we prove Proposition 1. We first introduce the expressions of 
Hessian matrix elements and some lemmas needed to prove Proposition 1. Aside 
from the conditions stated in the lemmas, the same regularity conditions as in 
Proposition 1 are implied as well. 

For notational purposes, let 0i = fa + faxi + faixi — r) K( Xi t ~ T ), then 



i=\ 



09, 85 



and 



where 



Jn{0)- d& 2 d5 ^Q 5 > + 9 0. &J2>> 



ggi(j) d 2 qi (6) -lyz/M 
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= (1, x u ( Xi - r)K{*fl), -p 2 (K(^) + ELZlif'(fLII)))*, 
Ob h n h n h n h n 

and the nonzero elements of the matrix are 
d 2 6.,,„ d 2 9 



[3,4] = ^[4,3] = -{Jr(V) + V^^'(V^)}. 



^2 L ' J ^2 L ' J 

/i n h n 
l -[4, 4] = (3 2 {-K'(^—) + ^—K"{^— )}. 



<9<5 2 ' ft-n h 2 h n 

Lemma B.l. Suppose that 

(a) sup u K'(u) < M < oo, and \u\K'(u) — > as \u\ — > oo, 

(b) sup u K"(u) < M < oo, and u 2 K"(u) — > as |u| — > oo, 

and under conditions (I), (II) and (III), matrix ij„ converges element-wise in 
probability to 

lim — E„ = S. 

n — >oc 7i 

Proof. Note that there are basically three different types of terms that need to be 
considered in both matrices J n /n and E„/n. First, ^ Y^^iid' 1 )' (®i) x i (^h~ L ) f° r 
= 0, 1, 2. Under the conditions (i), (ii) and condition (III) on the link function g(-) 
along with equations (2.3) and (2.4), the above terms converge to finite numbers 
as n — > oo, with the same arguments in proof of Lemma A.l. 

Similarly, by condition (a) and the similar argument in Lemma A.l, as n — > oo, 
the term 

1 " _ — 

i— 1 

Lastly, note that for the natural link function, most elements of matrices E n 
and J n are identical and free of Y's except elements J„[3, 4], J„[4, 3] and J„[4,4], 
which arc d 2 Q n ((3, t)/9/3 2 9t, d 2 Q n (f3, t)/9t<9/3 2 and d 2 Q n ((3, t)/3t 2 , respectively. 
Both the terms in d 2 Q n (f3 , t) / d fodr and in d 2 (J n (/3, t)/<9t 2 involving Y's can be 
shown to approach as n — * oo. Therefore, iS„ and — J n converge to the same 
matrix E. □ 

Lemma B.2. Under conditions (I) and (II), the normed estimating function is 
asymptotically normal, 

Proof. For fixed co > and the unit vector A, A'A = 1, we have the sequence 
S n = d° + u(E~ 1/2 ) f A e N„{cu). The Taylor expansion of the objective function is 

Q n (6 n ) = Q n (S ) + (6 n -S ys n -(S n -S yj n (6 n )(S n -S )/2, 
\\~8-5°\\ < \\S n -6°\\. 

Taking exponential and rearranging, 

exp{(\'V n (6 n )^ 2 /2) + Q n (S n )} = e^^A'E" 1 / 2 S n ) + Q n (S )}. 

Hence, 

e x P {(AV n (^)A^ 2 /2)} e ^" ( ^ )} L ra (J ra ) 
/g j\ s.xp\l n [d n )} 

u >/ v -i/2 C u exp{Q n (S )} 

= * S " 5n)} exp{l n (6°)} Ln{6 } ' 
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where L n (-) denotes the likelihood function of Y. By Lemma A.l, sup — \Q n (S) — 

l n (S)\ — > 0, and by condition (II), we have for any e > 0, there exists n\ > 0, such 
that for n> ni, 

\exp{X'V n {8 n )Xuj 2 /2) - exp(u 2 /2)\ < e, 

hence 

\E[exp{\'V n {8 n )\Lu 2 /2)] - exp(u 2 /2)\ 

< E[\exp(X'V n {8 n )Xuj 2 /2) - ea;p(w 2 / 2 )l] < e - 

The expectation of the left side of (B.l) converges to the moment-generating func- 
tion of the standard normal. It follows that the expectation of the right side of (B.l) 
also converges to exp(oj 2 /2). Thus, A'S n 1/,2 S n is asymptotically standard normal, 
that is 

E-^Sn^N^h). 
Finally, proof of Proposition 1 completes with 

= S n (S) = S n - J n (S*)(S - 5°), \\5* - 8°\\ <\\S- S°\\, 

Lemma B.l along with condition (II), and consistency of S stated in Proposition A.l. 

□ 
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